Enhancing Communication in Noisy Environments
نویسندگان
چکیده
Military personnel when deployed in environments characterized by high-level noise require personal hearing protection devices that may limit their auditory detection, sound localization and verbal communication capabilities. This limitation may have an impact on the successful outcome of the mission, as well as personal safety. Because of the noise and also the high incidence of hearing loss, members have to shout to be heard, and there is a high probability that commands, whether delivered face-to-face or by radio, will be misunderstood. Current hearing protection and listening devices are often incompatible with other gear and may reduce situational awareness. In consequence, personnel often dispense with hearing protection to improve operational effectiveness, resulting in hearing damage. A research study is underway that explores an alternative approach that includes the use of existing in-ear communications systems which incorporate active noise reduction combined with additional signal processing algorithms. The goal of the system is to suppress background noise while enhancing speech in order to improve face-to-face communication in noisy environments, with an expectation that such a system would permit hearing protection to be worn more consistently. The proposed system makes use of audio signals collected by an array of microphones mounted on a helmet. Integration of the system into a helmet is intended to improve compatibility with regular gear, while use of an array of microphones permits sound localization, and even steering of acoustic listening beams in specific directions, while suppressing the interference from the surrounding high-level ambient noise. We review currently available hearing protection technologies, assess their strengths and weaknesses, and motivate the need for speech enhancement technologies. We then describe the prototype system which is currently under development. 1.0 INTRODUCTION Military personnel are exposed in the course of their work to a wide range of noise environments, including high-level machine noise and impulsive sounds from explosives and firearms. As such, it is important that they wear effective hearing protection. This safety requirement competes, however, with the desire for unobstructed communication, both radio-based and face-to-face, and with the need for the preservation and enhancement of situational awareness. Hearing protection tends to impair sound detection and source localization, both of which contribute to situational awareness, and while some hearing protection devices (HPDs) can be made compatible with military radio units, effective hearing protection generally inhibits faceto-face verbal communication. Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE OCT 2009 2. REPORT TYPE N/A 3. DATES COVERED 4. TITLE AND SUBTITLE Enhancing Communication in Noisy Environments 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) Defence R&D Canada Toronto 1133 Sheppard Ave West, P.O. Box 2000, Toronto, ON, Canada, M3K 2C9 8. PERFORMING ORGANIZATION REPORT NUMBER 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release, distribution unlimited 13. SUPPLEMENTARY NOTES See also ADA562561. RTO-MP-HFM-181 Human Performance Enhancement for NATO Military Operations (Science, Technology and Ethics) (Amelioration des performances humaines dans les operations militaires de l’OTAN (Science, Technologie et Ethique)). RTO Human Factors and Medicine Panel (HFM) Symposium held in Sofia, Bulgaria, on 5-7 October 2009., The original document contains color images. 14. ABSTRACT Military personnel when deployed in environments characterized by high-level noise require personal hearing protection devices that may limit their auditory detection, sound localization and verbal communication capabilities. This limitation may have an impact on the successful outcome of the mission, as well as personal safety. Because of the noise and also the high incidence of hearing loss, members have to shout to be heard, and there is a high probability that commands, whether delivered face-to-face or by radio, will be misunderstood. Current hearing protection and listening devices are often incompatible with other gear and may reduce situational awareness. In consequence, personnel often dispense with hearing protection to improve operational effectiveness, resulting in hearing damage. A research study is underway that explores an alternative approach that includes the use of existing in-ear communications systems which incorporate active noise reduction combined with additional signal processing algorithms. The goal of the system is to suppress background noise while enhancing speech in order to improve face-to-face communication in noisy environments, with an expectation that such a system would permit hearing protection to be worn more consistently. The proposed system makes use of audio signals collected by an array of microphones mounted on a helmet. Integration of the system into a helmet is intended to improve compatibility with regular gear, while use of an array of microphones permits sound localization, and even steering of acoustic listening beams in specific directions, while suppressing the interference from the surrounding high-level ambient noise. We review currently available hearing protection technologies, assess their strengths and weaknesses, and motivate the need for speech enhancement technologies. We then describe the prototype system which is currently under development. 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT SAR 18. NUMBER OF PAGES 13 19a. NAME OF RESPONSIBLE PERSON a. REPORT unclassified b. ABSTRACT unclassified c. THIS PAGE unclassified Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 Enhancing Communication in Noisy Environments 20 2 RTO-MP-HFM-181 We are faced, therefore, with two objectives: to provide effective hearing protection, and to provide effective communication channels and situational awareness. With conventional HPDs, these two aims are at odds, for an improvement in the former implies a decline of the latter. In such cases, the goal must be to achieve a pragmatic balance between the two. Another possibility, however, is that advanced hearing protection might be developed which permits the conflict between these two objectives to be lessened. Electronic pass-through hearing protection (EPHP) devices attempt to proceed in this direction. EPHP devices electronically filter environmental noise, permitting only the filtered result to pass through to the listener. It may be possible to develop intelligent audio filters that preserve communication and situational awareness while still providing protection against harmful or distracting environmental noises. In this paper, we first provide an overview of hearing protection issues in a military context. We then review existing hearing protection technologies, from simple barrier devices to more sophisticated EPHP devices. An overview of recent efforts to overcome the shortcomings of existing technologies follows, and finally we describe a project currently underway at Defence Research & Development Canada to develop a helmetmounted speech enhancement system to promote improved face-to-face communication in noisy military environments. 2.0 HEARING PROTECTION IN A MILITARY CONTEXT 2.1 The problem of hearing damage Generally, exposure to continuous (8-hour) sound levels in excess of 85 dBA on a daily basis or the 8-hour energy equivalent in the case of sporadic or impulse noise will result in noise-induced hearing loss after 3-4 years [1, 2]. Initially, the hearing loss will manifest as a notch in the audiogram in the region of 4 kHz [3]. This outcome reflects the natural resonance of the ear canal at 3.8 kHz and the transfer function of the middle ear [4]. Over time, the notch will deepen and the hearing loss will spread to both higher and lower frequencies. A recently published study [5] reported that by midlife (46 yr and older) 42% of a sample of Canadian Forces (CF) military members working in land, air and maritime trades had acquired a hearing loss greater than 25 dB, the clinical criterion for diagnosis of hearing loss [6]. This outcome is consistent with data collected in US army personnel in the 1970s which showed that with 15 or more years of service, the percentage of hearing-impaired soldiers exceeded 50% [7]. Noise-induced hearing loss may be prevented by either reducing the noise at source or by the wearing of personal HPDs. Reduction of noise at source is both difficult to achieve and costly. In contrast, HPDs are readily available, effective and relatively low-cost. The Canadian Forces (CF) has had a hearing conservation program in place since 1950s. Components include noise measurement, reduction of noise at source where possible, education on the hazards of noise exposure, utilization of personal hearing protection and the regular monitoring of hearing [8]. Nonetheless, the cost of claims for noise-induced hearing loss has been steadily escalating. According to Veterans Affairs Canada, the budget for audiological services in 2006 was $41 million for 49,580 individuals. This figure does not include the cost of disability pensions which would double the total outlay [9]. The Canadian military experience is similar to that of the U.S. In a review of 70,000 audiograms of U.S. Navy and Marine Corps personnel, Bohnker et al. [10] found no evidence of an improvement due to hearing conservation initiatives. The prevalence of hearing loss increased with years of service and mean values were greater than published age corrected norms, for all ages. Enhancing Communication in Noisy Environments RTO-MP-HFM-181 20 3 2.2 Impediments to use of hearing protection Individuals working in high-level ambients, whether in military or civilian occupational settings or leisure activities, are reluctant to wear personal hearing protection. Reasons given are discomfort, difficulty fitting the device, and decreased ability to carry out auditory tasks such as the detection and localization of warning sounds, and speech communication [11]. Degradation of situational awareness may impact the success of the mission and result in casualties during military operations. Laboratory studies have confirmed that the issues raised by CF personnel are valid. Problems with comfort and fit relate mainly to earplugs. Although a wide range of plugs varying in materials and shape are readily available for purchase, most are sold in only one size. As well, the user must rely on instructions on the packaging with respect to method of inserting the device. Mean real-world sound attenuation is generally significantly less than the manufacturers’ specifications [12]. Speech understanding in noise does not appear to be affected in individuals with normal hearing [13]. Speech and noise are decreased proportionately and the speech-to-noise ratio (SNR) remains the same. However, in those with pre-existing hearing loss, the sound attenuation provided by the device adds to the subject’s raised hearing thresholds at the speech frequencies, resulting in a decrement in speech understanding. In contrast, sound localization will be compromised in both normal-hearing and hearing impaired listeners [14]. Right-left discrimination which depends on the central encoding of interaural differences in time-of-arrival and intensity will be preserved. Both plugs and muffs will interfere with spectral cues provided by the outer ear, resulting in decrements in the accuracy of discriminating front from rearward sound sources. Typically, plugs result in a bias in perceived location towards the back and muffs towards the front. 3.0 HEARING PROTECTION DEVICES (HPDs) 3.1 Conventional HPDs Conventional HPDs reduce ambient sounds by the same amount regardless of their level. However, the amount varies widely across makes and models, particularly for earplugs. For earmuffs, attenuation increases from about 15 dB at 0.125 Hz to about 35 dB at 1 kHz and then remains fairly stable. If well fit, earplugs generally provide relatively more attenuation (15-25 dB) below 1 kHz but are about the same above 1 kHz for highly rated devices [12]. Low-frequency attenuation may be increased by wearing a muff and plug in combination. 3.2 Level-dependent HPDs In contrast, the attenuation provided by level-dependent HPDs will depend on the level of the ambient. These devices incorporate either limited amplification or active noise reduction (ANR), accomplished using microphones housed in one or both ear cups [15]. In the case of limited amplification, low-level signals may be amplified by up to 10 dB until a pre-set risk criterion is reached (e.g., 82 dBA). Beyond the criterion, sound attenuation will increase by 1 dB for every 1 dB increment in sound level until the passive attenuation of the muff (e.g., 35 dB) is reached. In the case of ANR, an electronic circuit housed within the muff samples and inverts the incoming waveform and adds it out of phase to the original. Components of the two waveforms which are out of phase will cancel, thereby reducing the overall level. ANR is limited to frequencies below 1 kHz that often characterize industrial or military environments (e.g., aircraft cockpit). ANR is not suitable for reduction of impulsive sounds (e.g., blast and weapon’s fire), since the duration of these events is not sufficient for sampling the ambient. For these noise events, passive level-dependent Enhancing Communication in Noisy Environments 20 4 RTO-MP-HFM-181 devices, muffs or plugs are recommended. These contain a precision orifice in an acoustical duct that improves transmission of low-level sounds, with the result that speech communication is minimally reduced by less than 20 dB [16, 17]. A shock wave (e.g., weapon’s discharge) in the range of 80-120 dB (depending on the manufacturer and model) creates turbulent air flow in the orifice which restricts its passage, resulting in an increase in attenuation. 3.3 Advanced communication technologies A more recent innovation in hearing protector technology is the electronic pass-through hearing protector (EPHP). This type of device consists of a pair of conventional, level-independent earplugs or earmuffs, bilateral external microphones to pick up the ambient sound, internal speakers to present these to the ears, and an electronic processing unit which will pass and possibly amplify low-level sounds, reduce high-level continuous sounds using ANR and block impulsive sounds [18, 19]. In fact, a wide range of filtering and amplification options are available to EPHP systems. Exploration of those possibilities is an active area of research, and commercial EPHP systems have begun to appear on the market (for example, from Nacre, Sylinx, and Sensear). 4.0 ENHANCING SPEECH IN NOISY ENVIRONMENTS To better appreciate the possible scope for EPHP devices in a military context, it may be worthwhile to consider the general features of the acoustic environment of a soldier. It is characterized by several distinctive types of audio signals, as shown in Figure 1. First, there are speech signals, including face-to-face speech, radio-mediated speech, and potentially also ‘speech babble’ from nearby speakers. The former two sources are target signals which we wish to preserve and enhance; the latter is generally interference. Second, there is noise originating from vehicles or machinery in the vicinity. Such sources are usually dominated by low frequencies, and the spectrum may overlap with speech signals [20, 21]. Awareness of machine noise can be an important part of situational awareness, but it is often desirable to reduce the amplitude in order to protect the soldier’s hearing and to improve comprehension of speech. Third, there are impulsive sounds originating from weapon fire and explosions. Perception and localization of such sounds are a critical part of situational awareness. Finally, there are other environmental sounds, some of which may be classified as background noise, and others which may be considered relevant to situational awareness. Figure 1: The acoustic environment of the soldier. Face-to-face communication competes with impulse noises, machine noises, radio communications, and other background sounds. Enhancing Communication in Noisy Environments RTO-MP-HFM-181 20 5 The hearing protection technologies discussed in Section 3 affect the soldier’s perception of this acoustic environment in various ways. Conventional HPDs decrease the amplitude of all sources without discrimination. Though they are generally more effective at suppressing high frequencies [12], they are otherwise unable to make distinctions based on directionality, amplitude, source type, or other criteria. They can be effective at protecting hearing, but, for the same reason, they impair situational awareness. Level-dependent HPDs begin to provide some discrimination between different types of sources. Limited amplification HPDs can improve situational awareness in quiet environments by amplifying environmental noises and speech, but in loud backgrounds they function in a manner similar to conventional HPDs. HPDs with ANR technology preferentially attenuate low frequency noises, which is most effective at reducing machine noise. They are not effective at identifying and improving face-to-face speech communication, nor, as mentioned above, do they provide protection against impulsive sounds. With EPHP technology the range of possibilities is widened. Because the signals only reach the listener after passing through electronic auditory filters, the re-presentation of the auditory environment to the listener is limited only by the ingenuity of the filtering algorithms. A general aim of current research is to develop an EPHP system that provides better discrimination between target signals and interference. The system should be able to focus on signals of interest and reduce interference from competing sources. Stated in this way, the problem is a variant on the cocktail party problem, a challenging problem in psychoacoustics first defined by Cherry over a half-century ago [22]. Cherry noted the remarkable ability of human listeners to isolate and track a particular audio signal within a complex acoustic environment (such as the voice of an interlocutor within the complex background of voices and music at a cocktail party). The cocktail party problem is, first, the problem of understanding how the human auditory system divides the acoustic signal impinging on the ear into audio streams originating from a finite number of distinct sources, and, second, the problem of designing an automated computer system capable of performing the same task. From a signal processing point of view, the cocktail party problem is challenging. Many attempts have been made to solve it, although by general agreement it remains an outstanding problem (see [23] for a recent review). The difficulty derives primarily from the highly non-stationary spectral characteristics of both the target and interference signals, the spectral overlapping of the target and interference, and the possible presence of reverberation, echo, and other complicating factors. Of the various approaches that have been proposed to address the cocktail party problem, auditory scene analysis [24] stands out as one of the most promising, and has inspired our method. In this approach, the incoming signal is segregated into streams on the basis of a set of auditory cues, such as onset time or synchronized harmonic shifts. The particular cues employed in our system are discussed below in more detail. Once the cue-derived audio streams are established, it is possible to amplify the stream of interest and attenuate the others. 4.1 Fuzzy Cocktail Party Processor Defence Research & Development Canada has recently initiated a project with the aim of developing an EPHP system that provides hearing protection and radio communication while enhancing face-to-face speech communication and impulsive source localization in noisy military environments. In this section we describe the design of this system, discuss the basic technical approach, and summarize early performance indicators. The system has two independent components: a speech enhancement unit, and an impulse localization unit. Both units make use of helmet-mounted directional microphone arrays, and both are designed to comply with Enhancing Communication in Noisy Environments 20 6 RTO-MP-HFM-181 a few basic requirements. The system must be wearable, and so make limited computational and power demands. The signal processing must also be carried out in real-time, which is a significant constraint. Finally, the system must be adaptive in order to have robust performance under complex, dynamic acoustic conditions. We describe the two major components of the system separately. 4.1.1 Speech enhancement unit The speech enhancement unit has four main components: a microphone array to collect the ambient audio signal, a signal processing system to filter the signal and enhance speech, an Active Noise Reduction (ANR) component, and a hearing-protective earpiece to deliver the processed signal to the wearer. The microphone array has four directional microphones located in pairs near the ears, each pair consisting of one forwardfacing and one rear-facing microphone. The speech enhancement and ANR components work co-operatively: as the SNR increases past the point where the speech enhancement unit performs effectively, it is gradually replaced by the ANR system. The hearing protective earpiece will be interfaced also with the soldier’s radio communication unit. The signal processing system, called the Fuzzy Cocktail Party Processor (FCPP), is the main innovative component of the system. It has been developed primarily by Karl Wiklund at McMaster University [25]. Its basic architecture is shown in Figure 2. The input to the system is the four-channel digital signal obtained from the directional microphones. The central processing blocks are book-ended by cochlear filterbanks which produce the frequency-domain representation of the signal prior to processing and also reconstruct the time-domain signal to be delivered to the listener. A cochlear filterbank, which consists of a set of bandpass gamma-tone filters [26], mimics the frequency decomposition performed by the human ear, and can be efficiently implemented [27]. Figure 2: Fuzzy Cocktail Party Processor (FCPP) [25]. The next block, which performs cue estimation and mask calculations, is the heart of the system. In the spirit of auditory scene analysis, a set of auditory cues are used to assess the probability that a given time-frequency component of the signal belongs to the target (speech) signal. This probability is then applied as a mask on the time-frequency plane to enhance the target signal relative to the background. The auditory cues used by the system are onset, pitch, interaural time-of-arrival difference (ITD), and interaural level difference (ILD). Onset refers to the time at which a new sound is introduced into the environment. Frequency components with correlated onsets are likely to originate from the same sound Enhancing Communication in Noisy Environments RTO-MP-HFM-181 20 7 source. Because it focuses on the first appearance of a new source, onset is fairly robust against reverberation. Pitch is a cue specially associated with speech; it is known that vowels in voiced speech give rise to a periodic pulse-pattern that occurs not only in the fundamental frequency, but also across its harmonics. The presence of such a correlated periodicity across frequency bands is a good indication that they have a common origin and should be grouped together. Both onset and pitch are monaural cues, and so do not provide directional information. Directionality is derived from the ITD and ILD cues, which are binaural. ITD depends on the azimuthal position of the source. Similarly, ILD refers to the fact that a signal will be somewhat louder at the ear nearer the source than at the farther. Because our system is intended to enhance face-to-face communication, we use directional cues to preferentially enhance sources in the forward direction. Later versions of the system could preferentially enhance sources in some arbitrary direction, perhaps directed by eye-tracking systems. These cues are not accorded equal weight in the analysis. Due to their robustness in complex acoustic environments, the onset and pitch cues are given priority, with the other cues acting as constraints on signal source assignments. The cues are applied to the auditory stream using a fuzzy logic system [25]. In fuzzy logic, rather than being strictly true or strictly false, assertions take on probabilistic truth values. Fuzzy reasoning rules are based on linguistic statements that capture the basic intuitive principles of the analysis. For instance, a rule might state that if most cues are consistent with a source directly in front of the wearer, and if the characteristics of the sound are likely to be associated with speech, then there is probably a speaker in front of the wearer. The fuzzy logic system produces a probability that a given time-frequency unit originates from the target, and this probability is applied as a mask to enhance probable targets. The fuzzy logic approach has the merits of simplicity and computational efficiency. All of the auditory cues used in the preceding analysis are front-back symmetric; the spectral subtraction block in Figure 2 is used to distinguish sources in front of the wearer from those behind. Recall that two oppositely-oriented directional microphones are located near each ear; the signals obtained from the rearfacing microphones are subtracted from those obtained from the forward-facing microphones [21]. Finally, an adaptation control block adjusts parameters of the system in response to changes in the acoustic environment before the signal is reconverted to a time series and delivered to the listener. 4.1.2 Impulse localization unit The impulse localization unit is also helmet-mounted but operates independently of the speech enhancement unit. It consists of eight directional microphones uniformly distributed around the perimeter of the helmet. The localization is performed by comparing the time-of-arrival of incident impulsive acoustic peaks at the individual microphones. At the present time we assume incident plane waves, which is most accurate for sources in the far-field. The system localizes in the azimuthal plane, but not in elevation or range, and no attempt is made to identify the weapon from which the impulsive sound originated. The direction of incidence computed by the prototype system is indicated to the wearer through a hand-held visual display.
منابع مشابه
On the effect of low-quality node observation on learning over incremental adaptive networks
In this paper, we study the impact of low-quality node on the performance of incremental least mean square (ILMS) adaptive networks. Adaptive networks involve many nodes with adaptation and learning capabilities. Low-quality mode in the performance of a node in a practical sensor network is modeled by the observation of pure noise (its observation noise) that leads to an unreliable measurement....
متن کاملSpeech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کاملModel-based Single-Channel Dereverberation in Noisy Acoustical Environments
This paper illustrates a new system for recovering clean speech signals from noisy acoustical environments using one microphone. At the beginning of this paper, we propose an assumption that the background noise is comprised of reverberant noise and direct-path noise. And a novel late reverberant spectral variance (LRSV) estimator is generated referring to this assumption, which can be used in ...
متن کاملOptimization of Observation Membership Function By Particle Swarm Method for Enhancing Performances of Speaker Identification
The performance of speaker identification is severely degraded in noisy environments. Kim and et al suggested the concept of observation membership for enhancing performances of speaker identification with noisy speech [1]. The method is to weight observation probabilities with observation membership values decided by SNR. In the paper [1], the authors suggested heuristic parameter values for o...
متن کاملHow can a smartphone application help increase swift and clear communication in a noisy environment?
This paper explores new ways of interaction and information gathering in noisy environments. It introduces several events were noise blocked the way of communication exchange. In the same setting, this paper introduces several kinds of other types of information exchange making use of a smartphone application. It wil test these types of of communication by setting up experiments.
متن کاملIncorporating Auditory Masking Properties for Speech Enhancement in presence of Near-end Noise
In mobile devices, perceived speech signal degrades significantly in the presence of background noise as it reaches directly at the listener's ears. There is a need to improve the intelligibility and quality of the received speech signal in noisy environments by incorporating speech enhancement algorithms. This paper focuses on speech enhancement method including auditory masking propertie...
متن کامل